1 Introduction

Here the National Institute of Allergy and Infectious Diseases (NIAID) global funding from Year 1990 to 2017 are presented. The data consist only of the funding pertaining to projects on neglected tropical diseases worldwide. In addition to the neglected tropical diseases, we later added Influenza B, malaria, HIV/AIDS, Tuberculosis to illustrate differences between well-known illness and neglected tropical diseases. Through world map visualization, we display the amount of funding received by countries during the period of study. The GDP per capita of the countries funded are also illustrated. Two visualization are shown below. A startic map and an interactive map.

1.1 Funding and Number of Awards by Country from 1990 to 2017

1.1.1 A static map of Funding by countries

1.1.2 An interactive map of Funding by countries

In what follow, we constructed the same map but make it interactive. By clicking on the markers, one can access the information regarding each location.

1.2 Funding on Neglected Tropical Disease from 1990 to 2017

In this subsection, we catetgorized the diseases into different groups. For instance, all leishmeniasis were grouped under leishmeniasis, trypanosoma cruzi was under chagas, and everything that is not defined under any of the category illsutrated below are grouped under other for “other neglected tropical diseases”. Given the fact that Malaria, Influenza B, TB, and HIV?AIDS are not considered neglected tropical diseases according to both CDC and WHO as display on the respective websites CDC and WHO, we removed them from all analysis and reviewed the disease progression.

2 Disease Burden

Burden of disease is a concept that was developed in the 1990s by harvard School of Public Health, the Wolrd bank and the World Health Organization (WHO) to describe death and loss of health due to diseases (WHO). Since then, such definition has evolved to other metrics as display below:

  1. Disability-Adjusted Life Years (DALYs): One DALY is one lost year of “healthy life” due to disability, early death or ill-health. DALYs across population is the measurement of gap between the current health status and an ideal health situation where the population lives free of disease (source: WHO).

  2. Years of Life Lost (YLLs): Number of deaths multiplied by the standard life expectancy at the age at which death occurs (source: WHO).

  3. Year Lost due to Disability (YLD): Number of incident cases multiplied by the average duration of the disease and a weight factor that reflects the severity of the disease on a scale from 0 (perfect health) to 1 (dead) (Source: WHO).

  4. Mortality: Number of death due to the disease.

  5. Incidence: Number of new cases of disease that develop in a given period of time (Source: MedicineNet).

  6. Prevalence: Number of cases of the disease that are present in a particular population at a given time (Source: MedicineNet).

These metrics will be used to evaluate disease burden in the analysis. Moreover, recent disease burdens in the Global Disease Burden project are available with two years lag. Therfore, all comparison in this work to the funding will be performed using a similar two year gap (i.e. Funding from “1992 to 2019” vs. disease burden from “1990 to 2017”). The disease burden data was obtained from the Global Disease Burden project GBD.

2.1 Diseases Burden and Funding in U.S. Dollars

In what follow, we compared the disease burdens to the Funding for different disease categories. However we first deflated (i.e. inflation adjusted value) all the funding to 1992 values in order to see true association between funding and disease burdens. The inflation rate was calculated using the Consumer Price Index for All Urban Consumers (CPI-U), All items in U.S. city average, and seasonally adjusted with Base period 1982-84=100 as from the U.S. Bureau of Labor Statistics.

2.2 Regression analysis on Funding for 1997, 2007, 2017 with respect to DALYs of 1995, 2005, and 2015

Here, we performed a regression analysis by taking NIAID Funding from 1997, 2007, 2017 as depending variables and DALYs from 1995, 2005, and 2015v respectively as independent variable. Recall that to account for inflation, all funding values were deflated to 1992. Two regression models were constructed. One non-constrained model (solid line) and another constrained model that assumed no burden receives no funding (i.e. no intercept; dashed line).

2.3 Correlation between disease burden and awards

Spearman correlation was performed between the funding the disease burdens. The correlation table is attched separately.

3 Distribution of Diseases burden by Countries

In this section, we studied the level of disease burden in countries where funded was received from NIAID. Our analysis focused on checking whether the level of disease burden correlate with countries’ poverty level. In this study, we used the per capita GDP (obtained from the world bank database) to evaluate countries’ poverty level. Given the fact that, certain countries will have a higher population and therefore the burden may seems higher, we extracted the population size (again from the world bank database) from 1990 to 2017 and took the average of both the population size and the per capita GDP of 1990 to 2017. The average population size was used to calculate the per capita burden in a country (i.e. DALYs/population).

All the data were scaled for ease of comparison and we plotted both all countries data and certain a subset of the data that does not include countries in the northern or southern hemisphere since these are not in the tropical regions and should not have neglect tropical diseases. these plots were however included seprately in the presentation and thus not in this RMD file.

4 Statistical analysis of the time series data

To model the trend of the global disease burdens and predict future burdens, we conduct time series modeling utlilizing forecasting tools. Here, we combined the Auto-regressive and the Moving Average model for our analysis. we used the so called ARIMA model (Auto-Regressive Integrated Moving Average). See Forecasting: Principles and Practice for review on the ARIMA analysis. First, a cross validation is performed to access the validity of the model constructed and we then predict future burdens and funding.

4.1 Univariate time series analysis using ARIMA Model

4.2 Cross Validation

4.3 Forecasting into the future

5 Text analysis

In what follow we continue our analsis by focusing on the text variables. Data was query in isearch using the term “disease burden” resulting in 613 publications. We focused the analysis on “title”, “Mesh extracted”, " Abstracts“, and”Condition".We intend to identify the most comon organisms and disease burden metrics that occurs the most.